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Abstract: We consider problems of access control for update of XML doc- 
ument. In the context of XML programming, types can be viewed as hedge 
automata, and static type checking amounts to verify that a program always 
converts valid source documents into also valid output documents. Given a set 
of update operations we are particularly interested by checking safety proper- 
ties such as preservation of document types along any sequence of updates. We 
are also interested by the related policy consistency problem, that is detecting 
whether a sequence of authorized operations can simulate a forbidden one. We 
reduce these questions to type checking problems, solved by computing variants 
of hedge automata characterizing the set of ancestors and descendants of the 
initial document type for the closure of parameterized rewrite rules. 
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Resume : We consider problems of access control for update of XML document. 
In the context of XML programming, types can be viewed as hedge automata, 
and static type checking amounts to verify that a program always converts 
valid source documents into also valid output documents. Given a set of update 
operations we are particularly interested by checking safety properties such as 
preservation of document types along any sequence of updates. We are also 
interested by the related policy consistency problem, that is detecting whether 
a sequence of authorized operations can simulate a forbidden one. We reduce 
these questions to type checking problems, solved by computing variants of 
hedge automata characterizing the set of ancestors and descendants of the initial 
document type for the closure of parameterized rewrite rules. 

Mots-cles : XML transformations, Typing, Software Verification, Tree Auto- 
mata, Term Rewriting. 
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1 Introduction 

XML has developed into the de facto standard for the exchange and manip- 
ulation of data on the Web [I]. XML documents are textual presentations of 
data stored in a tree structure, and are commonly represented as finite labeled 
unranked trees. In general, they are constrained by typing restrictions such 
as XML schemas expressing structural constraints on the organisation of the 
markups. Most of the typing formalisms currently used for XML are based on 
finite tree automata. Several formalisms exist for the specification of transfor- 
mation functions for XML documents, e.g. for converting data from one source 
into a format suitable to a destination, for the automatic update of documents 
or the deletion of confidential data, e.g. for the enforcement of an access con- 
trol policy (wrapping or anonymization). Among these formalisms, the W3C 
XQuery Update Facility [I] defines some operations for document updates. 

Applying transformation functions in the context of documents following 
type constraints defined by schemas raises several compatibility problems. Static 
Type Checking in the context of XML document processing amounts to verify 
at compile time that every XML document which is the result of a specified 
query or transformation of a document with a valid input type produces an 
output document with a valid output type. Static Type Checking decidability 
is clearly dependant of the expressive power of the types and transformations 
that are employed. A standard approach to XML type checking is forward (resp. 
backward) type inference, that is the computation of an output (resp. input) 
XML type from given input (resp. output) type and a tree transformation. 
Then the type checking itself can be reduced to the verification of inclusion of 
the computed type in the given output or input type. 

In this paper, motivated by XML access control problems, we consider docu- 
ment transformations that are arbitrary sequences of atomic update operations, 
and we address the problem of their type inference. Since update operations, 
beside relabeling document nodes, can create and delete entire XML fragments, 
modifying document's structure, it is not obvious to check whether they preserve 
the types of documents. 

We propose a redefinition in term of rewrite rules (Section l3.1[) of the update 
operations of XACU [8], a formal model for specifying access control on XML 
data based on the W3C XQuery Update Facility draft [I]. For these operations, 
and some proposed extensions, we derive type inference algorithms that can also 
be employed to check access control policy local consistency (i.e. to determine 
whether no sequence of allowed updates starting from a given document can 
achieve an explicitly forbidden update). Such situations may lead to serious 
security breaches and that are challenging to detect according to [8 . Our 
results are obtained through the analysis of reachability sets of term rewriting 
systems for unranked trees, parametrized by hedge automata, and through the 
computation of an extension of hedge automata called context-free hedge au- 
tomata. Therefore they may give more insight on these notions that have not 
been investigated before. 

Related work: When considering general purpose transformation languages 
(e.g. XDuce, CDuce) for writing transformations, typechecking is generally 
undecidable and approximations must be applied. In order to obtain exact 
algorithms, several approaches define conveniently abstract formalisms for rep- 
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resenting transformations. Let us cite for instance TL (the transformation lan- 
guage) [E] whose programs can be translated in macro tree transducers [3T], 
and fc-pebble tree transducers [17] . a powerful model defined so as to cover rel- 
evant fragments of XSLT [T^] and other XML transformation languages. Some 
restrictions on schema languages and on top down tree transducers (on which 
transformations are based) have also been studied [TB] in order to obtain PTIME 
type checking procedures. 

The results based on tree transducers are difficult to compare to ours. On 
one hand, we consider a small class of atomic update operations whose expres- 
siveness cannot be compared to general purpose transformation languages, on 
the other hand, the application of updates is not restricted by strategies like e.g. 
top-down transformations in [16] . One can note that the works on typecheck- 
ing generally focus on the expressiveness of transformation languages, and are 
restricted to XML types modeled as regular tree languages (languages of tree 
automata) or DTDs (a strict subclass of regular tree languages). In our work 
we need to consider XML types that generalize regular tree languages and are 
recognized by context-free hedge automata [IT] . 

The first access control model for XML was proposed by \jg\ and was extended 
to secure updates in [3] . In [9] , the authors propose a solution to secure XUpdate 
queries. Static analysis has been applied to XML Access Control in pjj] to 
determine if a query expression is guaranteed not to access to elements that are 
forbidden by the policy. In [5] the authors propose the XACU language. They 
study policy consistency and show that it is undecidable in their setting. On 
the positive side [2] consider policies defined in term of annotated non recursive 
XML DTDs and give a polynomial algorithm for checking consistency. 

Several recent works have considered the application of rewriting to reason 
about access control policies. These works do not adress XML access control. 

Organization of the paper: We introduce the needed formal background 
about terms, hedge automata and rewriting systems in Section [51 Then we 
present XML update as parameterized rewriting rules in Section [3] Finally we 
give application to Access Control Policies in Section |U 

2 Definitions 

2.1 Unranked Ordered Trees 

Terms and Hedges. We consider a finite alphabet E and an infinite set of 
variables X. The symbols of E are generally denoted a, b, c . . . and the variables 
of X x, y. . . The set W(E, X) of hedges over E and X is the set of finite (possibly 
empty) sequences of terms where the set of terms over E and X is T(E, X) := 
Xu{a(h) | a G E,/i G W(E, X)}. The empty sequence is denoted () and when h 
is empty, the term a(h) will be simply denoted by a. We will sometimes consider 
a term as a hedge of length one, i.e. consider that T(E, X) C 7i(E, X). A leaf 
of a hedge (ti . . .t n ) is a leaf of one of the terms ii, ...,t n . 

The sets of ground terms (terms without variables) and ground hedges are 
respectively denoted T(E) and W(E). The set of variables occurring in a hedge 
h G H(E,X) is denoted var(h). A hedge h G W(E, X) is called linear if every 
variable of X occurs at most once in h. 
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The root node of a term is denoted by A. 

Substitutions. A substitution a is a mapping of finite domain from X into 
TC(T,,X). The application of a substitution a to terms and hedges (written 
with postfix notation) is defined recursively by xa := cr(x) when x G dom(a), 
ycr := cr(x) when y G «Y\<fom(cr), (ti . . . t n )a := (tier . . . i n cr) for n > 0, f(h)a := 
/(Jur). 

Contexts. A context is a hedge w G 7i(E, A") with a distinguished variable x u 
linear (with exactly one occurrence) in u. The application of a context u to a 
hedge /i G 7Y(E, X) is defined by := u{x u \— > /i}: it consists in inserting /i 
into an hedge in it at the position of x u . Sometimes, we write t[s] in order to 
emphasis that s is a subterm (or subhedge) of t. 

2.2 Hedge Automata 

We consider two typing formalisms for XML documents, defined as two classes 
of unranked tree automata. The first class is the hedge-automata [18], denoted 
HA. Most popular XML typing schemas like W3C XML Schemas or Relax NG 
arc equivalent in expressiveness to HA. The second and perhaps lesser known 
class is the context-free hedge automata, denoted CF-HA and introduced in [5D] . 
CF-HA are strictly more expressive than HA and we shall see that they are of 
interest for the typing of certain update operations. 

Definition 1 A hedge automaton (resp. context-free hedge automatonj is a 
tuple A = (E, Q,Q f , A) where E is an finite unranked alphabet, Q is a finite 
set of states disjoint from E, Q 1 C Q is a set of final states, and A is a set of 
transitions of the form a(L) — > q where a G E, q G Q and L C Q* is a regular 
word language (resp. a context-free word language). 

When E is clear from the context it is omitted in the tuple specifying A. We 
define the move relation between ground hedges in h, h! G H(E U Q) as follows: 
h -—^ h' iff there exists a context u G Ti (E, {xc}) and a transition a(L) — » q G A 
such that h ~ u[a(qi . . . q n )], with q\ . . . q„ G L and h' = u[q\. The relation —r-> 
is the transitive closure of —r*. 

Collapsing Transitions. We consider the extension of HA and CF-HA with 
so called with collapsing transitions which are special transitions of the form 
L — > q where L C Q* is a CF language and q is a state. The move relation 
for the extended set of transitions generalizes the above definition with the case 
u[qi . . . q n ] --^r u[q] if L — > q is a collapsing transition of A and q\ . . . q n G L. 
Note that we do not exclude the case n = in this definition, i.e. L may contain 
the empty word in L — > q. Collapsing transitions with a singleton language L 
containing a length one word (i.e. transitions of the form q — > q 1 , where q and 
q' are states) correspond to e -transitions for tree automata. 

Languages. The language of a HA or CF-HA A in one of its states q, denoted 
by L(A,q), is the set of ground hedges h G W(E) such that h q. We say 
sometimes that an hedge of L{A, q) has type q (when A is clear from context). 
A hedge is accepted by A if there exists q G Q f such that h G L(A,q). The 
language of A, denoted by L(A) is the set of hedge accepted by A. 
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Note that without collapsing transitions, all the hedges of L(A, q) are terms. 
Indeed, by applying standard transitions of the form a(L) — > a, one can only 
reduce length-one hedges into states. But collapsing transitions permit to reduce 
a ground hedge of length more than one into a single state. 

The e-transitions of the form q — > q' do not increase the expressiveness HA or 
CF-HA (see [5] for HA and the proof for CF-HA is similar). But the situation is 
not the same in general for collapsing transitions: collapsing transitions strictly 
extend HA in expressiveness, and even collapsing transitions of the form L — > q 
where the left member L is a finite (hence regular) word language. 

Example 1 [TT]. The extended HA A — ({q, q a , qb, 9f }, {g, a, 6}, {qt}, {a — > 
q a ,b — > qb, g{q) — > gf, q a qb — > g}J recognizes {g(a™6") | n > 1} which is not a HA 
language. 

However, collapsing transitions can be eliminated from CF-HA, when restricting 
to the recognition of terms. 

Lemma 1 ( flljj ) For every extended CF-HA overU with collapsing transitions 
A, there exists a CF-HA A' without collapsing transitions such that L(A') = 
L(A) flT(S). 

Properties. It is known that for both classes of HA and CF-HA membership 
and emptiness problems are decidable in PTIME [THIEI]- Moreover HA lan- 
guages are closed under Boolean operations, but CF-HA are not closed under 
intersection and complementation. The intersection of a CF-HA language and a 
HA language is a CF-HA language. All these results are effective, with PTIME 
constructions of automata of polynomial sizes for the closures under union and 
intersection. 

We call a HA or CF-HA A = (£, Q, Q f , A) normalized if for every a e £ 
and every q € Q, there is at most one transition rule a{L a ^ q ) — > q in A. Every 
HA (resp. CF-HA) can be transformed into a normalized HA (resp. CF-HA) 
in polynomial time by replacing every two rules a{L{) — > q and 0(^2) — > q by 
a(Li U L 2 ) — > q. 

2.3 Infinite Term Rewrite Systems 

We use term rewriting as a formalism for modeling XML update operations. For 
this purpose, we propose a non-standard definition of term rewriting, extending 
the classical one in two ways: the application of rewrite rules is extended from 
ranked terms to unranked terms and second, the rules are parametrized by 
HA languages (i.e. each parametrized rule can represent an infinite number of 
unparametrized rules) . 

Term Rewriting Systems. A term rewriting system 1Z over a finite unranked 
alphabet S (TRS) is a set of rewrite rules of the form I — > r where I € X)\ 
X and r G 7Y(S, X); t and r are respectively called left- and right-hand-side (Ihs 
and rhs) of the rule. Note that we do note assume the cardinality of 1Z to be 
finite. 

The rewrite relation -^-> of an TRS 1Z is the smallest binary relation on 
H(E, X) containing 7Z and closed by application of substitutions and contexts. 
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In other words, h h! iff there exists a context u, a rule < -> r £ K and a 
substitution a such that h = u[£o~] and h' = u[ro~]. The reflexive and transitive 
closure of is denoted -^->. 

Example 2 With ft = {g(x) -> a;}, we have /i for all h € A 7 ) 

(the term is reduced to the hedge h of its arguments). With TZ = {g{x) — > 
g(axb)}, g(c) g{a n cb n ) for every n > 0. 

Parametrized Term Rewriting Systems. Let A = (£, Q, Q f , A) be a HA. 

A term rewriting system over E and parametrized by A (PTRS) (see [10 is given 
by a finite set, denoted TZ/A, of rewrite rules I —> r where ^ G A") and 

r 6 7i(E tbl <5, A") and symbols of Q can om y label leaves of r. In this notation, 
A may be omitted when it is clear from context or not necessary. The rewrite 
relation ^Ja > associated to a PTRS TZ/ A is defined as the rewrite relation 
> where the TRS TZ[A] is the (possibly infinite) set of all rewrite rules 
obtained from rules I — > r in 72-/^4 by replacing in r every state q £ Q by 
a ground hedge of L(»4, g). Several example of rewrite rules can be found in 
Figure [T] below. 



Properties. Given a set L C 7Y(E,A") and a PTRS TZ/A, we denote by 
postal A (L) — {h £ 7i(T,,X) \ 3h' G L,/i' > /i} and pre^^(L) = {h € 

«(£,#) | 3/i' e L,h^^ h'}. 

Ground reachability is the problem to decide, given two hedges /i, /i' € W(E) 
and a PTRS 7?./^l whether h n * A > /i'. Reachability problems for ground 
ranked tree rewriting have been investigated in e.g. [TU]. C. Loding [T^] has 
obtained results in a more general setting where rules of type L —> R specify the 
replacement of any element of a regular language L by any element of a regular 
language R. Then [14] has extended some of these works to unranked tree 
rewriting for the case of subtree and flat prefix rewriting which is a combination 
of standard ground tree rewriting and prefix word rewriting on the ordered 
leaves of subtrees of height 1. 

Typechecking is the problem to decide, given two sets of terms r,- n and r out called 
input and output types (generally presented as HA) and a PTRS TZ/A whether 
post* (T in ) C Tout or equivalently r in C pre*(T out ) 

Note that reachability is a special case of model checking, when both T% n and 
T ou t are singleton sets. Hence typechecking is undecidable as soon as reachability 
is. 

One related problem, called type inference, is, given a of PTRS TZ/A and a HA 
or CF-HA language L, to construct a HA or CF-HA recognizing post^L) or 
pre* n {L). 



3 Type Inference for Update Operations 

In this section, we address the problem of type inference for arbitrary finite 
sequence of update operations. More precisely, we propose a redefinition in 
term of PTRS rules (Section 13. ip of the update operations of XACU [5] and 
some extensions. Then, we show how to construct HA and CF-HA recognizing 
respectively post*^(L) and pre\(V) given a HA or CF-HA language L and a 
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XACU 


XACU+ 


a(x) 


- b(x) 


REN 








a(x) 


■+ a(pcc) 


INSfirst 


a(x) 


-> b(px) 


'Ns; irst 


a(x) 


■* ffl(ip) 


INS| ast 


a(x) 


■* b(xp) 


INS[ ast 


a(xy) - 


-* a(xpy) 


INSi nto 








a(x) 


-> pa(x) 


INSleft 








a(x) 


-+ a(x)p 


INS righ t 








a(x) 




RPL 


a(x) 


"> Pl-.-Pn 


RPL' 


a{x) 


- 


DEL 


a(x) 


-> X 


DEL S 



Figure 1: PTRS rules for XACU and extension 

PTRS 7?. representing XACU operations (Sections 13. 2p or extended updates 
(Section 

The motivation for showing these results are twofold. First, these construc- 
tions permit to address the problems of reachability and typechecking. Second, 
they also permit the synthesis of missing input or output types. Imagine that a 
PTRS 1Z is given, as well as an input type Tj n , defined as an HA, but that the 
output type (for the application of rules of 1Z to terms of Ti„) is not known. The 
result of Theorem [1] ensures that we can build a CF-HA recognizing post^Tm) 
and which can be use as a definition of a synthesized output type for 1Z. Sim- 
ilarly, the result of Theorem [3] can be used to synthesis an input type, defined 
by the HA constructed for pre^(r ou t), given an output type r out and a PTRS 
K/A. 

3.1 Update Operations 

Figure Q] displays PTRS rules corresponding to the rules of XACU as defined 
in [S] (in the first column) and to some extensions (in the second column). We 
call XACU the class of all PTRS containing rules of the kind presented in the 
first column of Figure H] and XACU+ the class of all PTRS containing any rule 
presented in Figure [TJ 

In this section we assume given an unranked alphabet £ and a HA A = 
(£, Q, Q , A). The rewrite rules are parametrized by states p, pi,..,p n of A. 

XACU rules. Let us first describe the update operations of XACU (see also [H]). 
REN renames a node: it changes it label from a into b. Such a rule leaves the 
structure of the term unchanged. INSfj r5t inserts a term of type p at the first 
position below a node labeled by a. INS| ast inserts at the last position and INSi nto 
at an arbitrary position below a node labeled by a. I NS| e ft (resp. I NS r j g ht) insert 
a term of type p at the left (resp. right) sibling position to a node labeled by a. 
DEL deletes a whole subterm whose root node is labeled by a and RPL replaces 
such a subterm by a term of type p. 
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Example 3 The patient data in a hospital are stored in an XML document 
whose DTD type can be recognized by an HA A with rules: 

hospital(p*) -> p u name(p*) -> p n a -> p c 

patient(p n p t ) -> p pa drug(p*) -> p dr b -> p c 

patient(p n ) -> p epa diagnosis(p*) -> p dia c -> p c 

treatment(pdrPdiaPda) -> Pt date(p*) -> p da ■ 



For instance we can use a DEL rule patient(ir) — > () for deleting a patient, 
and a INS| ast rule hospital(x) — > hospital(a;p pa ) to insert a new patient, at the 
last position below the root node hospital. We can ensure that the patient 
newly added has an empty treatments list (to be completed later) using the rule 
hospitals) — > hospital(.Tp epa ). The I NS r i g ht rule name(ai) — > name(a;)p t can be 
used to insert later a treatment next to the patient's name. 



Extended rules. In XACU+ wc introduce several extensions of the rules of 
XACU. We shall see in Section EP1 that the typing of these extended operations 
is different from the typing of the operation of XACU: while the type of terms 
obtained by XACU operations can be described by HA, CF-HA must be used 
in order to describe the type of terms obtained by XACU+. A restriction of 
the insertion rules of XACU (the rules called INS*), following the definitions 
in [S], is that the label of the node at the top of the lhs of the rules is left 
unchanged. Only the rule REN permits to change the label of a node in a term, 
while preserving the other nodes. The rules INS'„ combine the application of the 
corresponding insert operation INS, and of a node renaming REN. We will see 
in Section |3~51 that allowing such combinations has important consequences wrt 
type inference. 

The rule DEL S deletes a single node n whose arguments inherit the position. It 
can be employed to build a user view as in [7] . 

Example 4 Assume that some patients of the hospital of Example [3] are 
grouped into one category like in hospital(. . . priority(p* a ) . . .), and that we want 
to delete the category priority while keeping the patients information. This can 
be done with the DEL 5 rule priority(x) — > x. 

Finally, with RPL' we slightly generalize the rule RPL by allowing a subterm 
whose root node is labeled by a to be replaced by a sequence of n terms of 
respective types pi,. . . , p n . 

Note that RPL and DEL are special cases of RPL, with n = 1 and n = 
respectively. 



3.2 Forward Type Inference for XACU Rules 

In this section and the following, we want to characterize the sets of terms which 
can be obtained, from terms of a given type, by arbitrary application of updates 
operations as PTRS rules. For this purpose, we shall study the recognizability 
(by HA and CF-HA), of the forward closure (post*) of automata languages 
under the above rewrite rules. 
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Theorem 1 Given a HA A on S and a PTRS TZ/A £ XACU, for all HA 
language L, post^^^L) is the language of an HA of size polynomial and which 
can be constructed in PTIME on the size of TZ, A and an HA of language L. 

Proof, (sketch, see Appendix [Blfor a complete proof). We consider a normalised 
HA Al recognizing L and add transitions (but no states) to the NFAs defining 
its horizontal languages in transitions a(L a . q ) — > q. For instance, if a(x) —* 
a{px) £ TZ/A we add one transition (i a ,q,P, ia,g) looping on the initial state i a ^ q 
of the NFA for L aq . If a(x) — > a(x)p £ TZ/A, and there exists a transition 
(s, q, s')' in some NFA, we add one transition (s',p, s'). □ 
Let us come back to our motivations. A first consequence of Theorem \T\ 
concerns to the typechecking problem. 

Corollary 1 The typechecking is decidable in PTIME for XACU. 

Proof. Let Ti„ and T ou t be two HA languages (resp. input and output types), and 
let TZ/A by a PTRS. We want to know whether post^/^(r, n ) C r out . Following 
Theoremdl postal A {Ti n ) is a HA language. Hence postal A (ji n ) n r out is a HA 
language, and testing its emptiness solves the problem. □ 
Regarding the problem of type synthesis, if we are given TZ/A and an input 
type Ti n , Theorem [T] provides an output type presented as a HA. 



3.3 Forward and Backward Type Inference for XACU+ 
Rules 

Theorem [T] is no longer true for the rules of the extension XACU + : the examples 
below show that the rules of XACU+ \ XACU do not preserve HA languages in 
general. However, we prove in Theorem [2] that the rules of XACU+ preserve the 
larger class of CF-HA language. 

Example 5 Let S = {a, b, c, c'} and let TZ be the finite TRS containing the 
two INSj jrst and INS[ ast rules c(x) — > c'(ax), c'(x) — > c(xb). We have post^({c}) n 
TL(T,) = {c(a n b n ) n > 0}, and this set is not a HA language. It follows that 
post^({c}) is not a HA language. O 

Example 6 Let S = {a,b,cj, let TZ be the finite TRS with one DEL S 
rule c(x) — > x and let L be the HA language containing exactly the terms 
c(ac(a . . . c. . -b)b); it is recognized by the HA with the set of transition rules 
{a -> q a ,b -> qb,c({(),q a qqb}) -> We have post\(L) ("1 c({a, b}*) = 

{c(a n b n ) | n > 0}, hence post^L) is not a HA language. O 

Theorem 2 Given a HA A on S and a PTRS TZ/A G XACU+, for all CF-HA 
term language L, postal ^(L) is the language of an CF-HA of size polynomial 
and which can be constructed in PTIME on the size of TZ, A and an CF-HA 
recognizing L. 

Proof, (sketch, see Appendix [Cl for a complete proof). We consider a nor- 
malised HA Al recognizing L and, very roughly, we define new CFG Q a , q for 
the horizontal languages as the union of CFG of transitions of Al with a new 
initial non-terminal I' a „ and new production rules according to TZ/A. For in- 
stance, if a(x) — ► b(x) G TZ/A, we add a production rule I' b :— I' a q and for 
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a(x) b(px), we add I' b := pl' a q . Moreover, we also add collapsing transitions 
like pi . . .p n — ► q if a(x) —^pi---p n G 72-/^4.. □ 

Corollary 2 The typechecking is decidable in PTIME for XACU+. 

Proof. The proof is the same as for Corollary [TJ because the intersection of 
a CF-HA and a HA language is a CF-HA language (and there is an effective 
PTIME construction of an CF-HA of polynomial size) and emptiness of CF-HA 
is decidable in PTIME. □ 

Theorem 3 Given a HA A on £ and a PTRS 11/ A E XACU+, for all HA 
language L, pre^^(L) is a HA the language. 

Regarding the problem of type synthesis for a TZ/A G XACU+, if only an 
output type T ou t is given, then Theorem [2] provides an input type for TZ/A 
presented as a HA, and if only an input type Tj„ is given, then Theorem [2] 
provides an output type presented as a CF-HA. Unlike HA, CF-HA are not 
popular type schemas, but HA solely do not permit to extend the results of 
Theorem Q] as shown by the above examples. 

4 Access Control Policies for Updates 

In this last section we study some models of Access Control Policies (ACP) for 
the update operations defined in Section [3J and verification problems for these 
ACP. 

4.1 Term Rewrite Systems with Global Membership Con- 
straints 

The ACP language XACU annot introduced in [8] follows the approach of DTD 
with security annotations of [7] to specify the read and write access authoriza- 
tions for XML documents in the presence of a DTD. Annotated DTDs offer an 
elegant formalism for ACP specification, which is especially convenient for de- 
veloping techniques of type analysis. However, it imposes the strong restriction 
that every document t to which we want to apply an update operation (under 
the given ACP) must comply to the DTD D used for the ACP specification. 

In our rewrite based formalism, this condition may be expressed by adding 
global constraints to the parametrized rewrite rules of Section [231 These global 
constraints restrict the whole term to be rewritten (not only the redex) to belong 
to a given regular language. Theorem[l]below shows that, unfortunately, adding 
such constraints to ground rules (which are a very special kind of RPL rules) 
makes the reachability undecidable. 

Given a HA A = (£, Q, Q , A), a term rewriting system over S, parametrized 
by A and with global constraints (PGTRS) is given by a finite set, denoted TZ/A, 
of constrained rewrite rules L ::£—>■ r where £ and r satisfy the conditions of 
the rewrite rules of Section I2T31 and L C T(S) is a HA language. A PGTRS is 
called uniform if the language L is the same for every rule. The rewrite relation 
for PGTRS is defined as the restriction of the relation defined in Section [2751 to 
ground terms: for the application of a rule L :: £ — » r to a term t, we require 
that tel. 
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XACU 2 


XACU 2 + 


b(ya(x)z) - 


-» b(ypa(x)z) 


INSa.ieft 








b(ya(x)z) - 


-> b(ya(x)pz) 


INS2. right 








b(ya(x)z) - 


-> b(ypz) 


RPL 2 


b(ya(x)z) 


-» b{yp 1 ...p n z) 


RPL 2 


b(ya(x)z) - 


-> b(yz) 


DEL 2 


b(ya(x)z) 


-> b(yxz) 


DEL 2 , S 



Figure 2: PTRS rules for XACU with context control 

Theorem 4 Reachability is undecidable for uniform PGTRS without variables 
and parameters. 

The result can be contrasted with some decidability results on ground rewriting 
[TU] . It is also a refinement of [5] where XPath queries are used filter out nodes 
where the updates apply. As a corollary, reachability, hence inconsistency (see 
Section |4~3|) ■ are undecidable for XACU onno t ACP based on annotated recursive 
DTDs. 

4.2 XACU2+: Rewrite Rules with Context Control 

The PTRS rewrite rules of Section [3] permit to define a minimal control for 
the application of the updates operations. Indeed, all the lhs of rules have the 
form a(x) (or a(xy) for INSi nto ), meaning that the application to such rules is 
restricted to nodes labeled with a (i.e. to nodes of DTD element type a if the 
document conforms to a given fixed DTD). 

For the rules with an hedge at rhs (like INS| e ft, I NS r i g ht , RPL, DEL, DEL 5 ...) 
we can extend this idea by furthermore constraining the label of the node at 
the parent node of the performed update. The generalized rules are defined in 
Figured 

Example 7 The DEL 2 rule hospital(j/ patient(a;) z) — > hospital(yz) can be used 
to delete a patient only if it is located under a hospital node. 

This approach can be compared to the annotated DTD of [TJ- The security 
annotations of [7J are indeed mappings ann from pairs of DTD elements types 
(b, a) into values of Y, N or [q] (for resp. read access allowed, denied or condi- 
tionally allowed, where q is an XPath qualifier). An annotation ann(b,a) = Y 
or N or [q] indicates that the a children of b elements (in an instantiation of 
the given DTD D) are accessible, inaccessible or conditionally accessible respec- 
tively. This approach is limited to the case of unambiguous DTDs, where the 
element type a can have at most one element b as parent. 

Let us call XACU 2 + the class of all PTRS containing rules of XACU+ or rules 
of the kind described in Figure [2] The construction of Theorem |3] for backward 
type inference can be straightforwardly extended from XACU+ to XACU 2 + . 

Theorem 5 Given a HA A on £ and a PTRS 11/ A £ XACU 2 +, for all HA 
language L, pre^^(L) is a HA language. 
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4.3 Local Inconsistency of ACP 

Following e.g. [2], an ACP for XML updates can be defined by a pair 
(TZ a /A,TZf /A) of PTRS, where Jl a contains allowed operations and TZf con- 
tains forbidden operations. Such an ACP is called inconsistent [5J [2] if some 
forbidden operation can be simulated through a sequence of allowed operations. 

Example 8 Assume that in the hospital document of example [3j it is forbid- 
den to rename a patient, that is the following update of RPL2 is forbidden: 
patient(y name(x) z) — > patient(yp n z). 

If the following updates are allowed: patient(a;) — > () for deleting a patient, and 
hospital(x) — > hospital(a;pp a ) to insert a patient, then we have an inconsistency 
in the sense of [2] since the effect of the forbidden update can be obtained by 
a combination of allowed updates. 

Using the results of Section [3J we can decide the above problems individually 
for terms of D. More precisely, we solve the following problem called local 
inconsistency: given a HA A over S, an ACP (7Z a /A, TZf /A) and a term t £ 
T(E), does there exists u £ T(£) such that £ ' u an( l t TC *^ > w? 

Theorem 6 Local inconsistency is decidable in PTIME for XACU+. 

Proof. It can be easily shown that the set {u G | t ^— ^ > it} is the 

language of a HA of size polynomial and constructed in PTIME on the sizes 
of A, TZf and t. By Theorem [21 post^ /^({f}) is the language of a CF-HA of 
polynomial size and constructed in polynomial time on the sizes of A, lZ a and 
t. The ACP is locally inconsistent wrt t iff the intersection of the two above 
language is non empty, and this property can be tested in polynomial time. □ 

Conclusion 

We have proposed a model for XML updates based on term rewriting, and shown 
that type inference is possible and the problems of reachability and typecheking 
are decidable for the arbitrary application of XACU update rules, as well as some 
extensions, when the application is only controlled by the label of the node at 
the update position and also at its parent node. We have also shown that 
these problems become undecidable when restricting the application of update 
operations to documents conforming to a fixed given DTD. 

As further works, we could study restrictions on the regular tree languages 
in the constraints of PGTRS enabling the decidability of typechecking for XACU 
rules with global constraints. Another interesting topic, w.r.t. the verification 
ACP for updates based on annotated DTDs is the access conditioned with XPath 
queries. We could model this with rewrite rules constrained by XPath qualifiers. 
Reachability is undecidable for such a formalism, even when the rules are ground 
(a consequence of a result of (50) ■ However, the construction of [FJ involves 
upward navigation; some fragments of downward Core XPath could permit to 
obtain decidability. 

Actually in [8], the undecidability of the inconsistency problem is stated but the construc- 
tion in this paper proves the undecidability of reachability as well. 
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A Appendix: proof of Lemma [T] 

In this proof and the following, wc describe the CF grammars used for defining 
the horizontal languages of CF-HA transitions as tuples Q — (E, A/", I, Y), where 
S is a finite alphabet (set of terminal symbols), N is a set of non terminal 
symbols, I € J\f is the initial non-terminal, and r £ Af x (TV U £)* is a set of 
production rules. 

Lemma [T] [llj. For every extended CF-HA over £ with collapsing transitions 
A, there exists a CF-HA A' without collapsing transitions such that L(A') fl 
T(£) = L(A) fl T(£). 

Proof. Let £ = (Q,N,I,Y) and & = (Q,jV 1) I li r 1 ) be two CF grammars over 
the same finite alphabet Q. Below, Q and Gi are respectively meant to generate 
the languages L and L\ of CF HA transitions L — > g and a(Li) — > p. We 
assume w/og that the sets of non terminals A and Ai of Q and (Ji respectively 
are disjoint. Let q £ Q be a terminal symbol and let A g be a fresh non terminal 
symbol. We consider below the CF grammar 

Qii G q ■= (Q, JVi y w y {x,}, / b r^? ^ i,] u r[ 9 ^ i,] u {i, :=q,x q :=/}) 

where T[q <— X q ] denotes the set of production rules of Y where every occur- 
rence of the terminal symbol q is replaced by the non-terminal X q . Using this 
construction, we can get rid of collapsing transitions in CF HA. 

We assume that A is normalized with state set Q and for each a £ E and 
p £ Q, we let Q a ,p by the CF grammar generating the language L a ,p in the 
transition (assumed unique) a(L 0jP ) — > p of A. In order to construct A' out of 
A, we perform the following operation for every collapsing transition L — > q of 
A: (i.) delete L — > q (ii.) for each a £ E and p € Q, replace C/ 0jP by G a ,pl q 
where Q is a CF grammar generating L. □ 

B Appendix: proof of Theorem [1] 

In this proof and the following, we describe finite automata for the horizontal 
languages of HA transitions as tuples B — (S, S,i,F,Y), where E is the finite 
input alphabet, S is a finite set of states, i is the initial state, F C S is the set of 
final states and r C S x (E U {e}) x S is the set of transitions and e-transitions. 
For s, s' £ 5*, we write s -g-> s' to express that s' can be reached from s by 
a sequence of e-transitions of B, and s — 1 "' " > s', for a\, . . . , a n £ E, if there 
exists 2(n + 1) states sq, s' , . . . , s n , s' n £ S with sq = s, s n —^> s' and < i < n, 
Si s[ and (s-, a i+ x, s i+ x) £ T. 

Theorem Q3 Given a 77,4 A on E and a PTRS TZ/A £ X4CL/, /or all HA 

language L, post^^(L) is the language of an HA of size polynomial and which 
can be constructed in PTIME on the size of 1Z, A and an HA of language L. 

Proof. Let A = (E,P,P f ,6) and let A L = (£, Ql, Q f L , A L ) recognize L. We 
assume that both A and Al are normalized and that their state sets P and Ql 
are disjoint. We construct a HA A' = (P WQl, Q^, A') recognizing post^^ A (L) . 
For each a £ E, q £ Q^, let L a ^ q be the regular language in the transition 
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(assumed unique) a(L a , q ) —> q E A L , and let B a , q = {Q L , S a . q , i a , q , {f a , q }, ^ a ,q) 
be finite automaton recognizing L a . q . It has input alphabet Ql, set of states 
S a . q , initial state i a , q E S a , q , final state f a , q E (that we assume unique 
wlog) and set of transition rules T a , q C x Qj, x S^. The sets of states 
S a , q are assumed pairwise disjoint. Let S be the disjoint union of all S a , q for all 
a E S and <j G Ql- 

For the construction of A', we develop a set of transition rules r' C S x (PU 
Ql) x S 1 . Initially, we let V be the union To of all T a . q for a G S, g G Ql, and 
we complete T' iteratively by analyzing the different cases of update rules of 
1Z/A. At each step, for each a G £ and g G Ql, we let B' a be the automaton 
(P U Ql, S, ia :q , {f a , q} , f). For the sake of conciseness we make no distinction 
between an automaton B' a and its language L(B' a ). 

REN: for every a(x) — ► 6(.t) G 7£/.4 and g G Ql, we add two e-transitions 
(ib,q,e,ia, q ) and {f a>q ,e, fb,q) to T'. 

INSfi rst : for every a(x) — » a(pi) G T2./.4 and q G Ql, wc add one looping 
transition (i a ,q,P,ia,q) to V . 

INS| ast : for every a(x) — > a(xp) G 7£/.A and g G Ql, we add one looping 
transition rule {f a , q ,P, fa,q) to T'. 

INSi nto : for every a(a;y) — > a(xpy) G 7£/.4, g G Ql and s E S reachable from 
i a . q using the transitions of T', we add one looping transition rule (s,p, s) 

tor'. 

INSieft: for every a(x) — > pa(x) E 1Z/A, q E Ql and state s E S such that 
L(B' a q ) ^ and there exists a transition (s, g, s') G T', we add one looping 
transition (s,p, s) to T'. 

I NS r ight : for every a(x) — > a(x)p E 1Z/A, q E Ql and s' E S such that 
L(B' a q ) ^ and there exists a transition (s, g, s') G T', we add one looping 
transition (s',p,s') to T'. 

RPL: for every a(x) -> p G q E Q L , and s, s' E S such that L{B' aq ) ^ 0, 

and there exists a transition (s, g, s') E V , we add one transition (s,p, s') 

to r. 

DEL: for every a(x) -> () G ft/.4, g G Ql, and s,s' E S such that L(Py 9 ) ^ 0, 
and there exists a transition (s, q, s') E T', we add one e-transition (s, e, s') 

to r. 

We iterate the above operations until a fixpoint is reached (only a finite 
number of transition can be added to T' this way). Finally, we let A' := O U 
{ a ( B 'a.q) q \ a E T,,q E Q, L(B' aq ) ^ 0}. Let us show now that L(A') = 
post* R/A (L). 

Lemma 2 L(A') C post^ A (L). 

Proof. We show more generally that for all t E L(A',q), q E Ql, there exists 
u E L(AL,q) such that u t. The proof is by induction on the multiset M 
of the applications of horizontal transitions of V not in T in a run of A' on t 
leading to state q. 
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Base case. If all the horizontal transitions are in r , then by construction 
t G L(Al, q) and we are done. 

Induction step. We analyse the cases causing the addition of a transition of 

r'\r . 

REN : let t G L(A',q) (q G Ql), and assume that an e-transition (%,q,£, i a<q ) 
is used in a run of A' on t, and that this e-transition was added to T' because 
a(x) -> b(x) G llj A. Let 

t = t[b(h)} t[6( gi . . . q n )} t[q ] 9 

be a reduction of A' such that the above e-transition is involved in the step 
t[b(qi . . . q n )\ t[qo], where the the transition b(Bb iqo ) — ► q n is applied. Hence 
qi . . . q n 6 L(Bi,^ qo ) 7 with g g b " g " > and the first step in this computation 
is (ib t q,s, i a , q )- The last step must be (f a ,q, £, fb.q), using an e-transition added 
to T' in the same step as (ib, q ,e, i a ,q)- By deleting these first and last steps, we 
get i a .q ff'" 9 " > fa.q, hence q\ . . . q n G L(B a . qo ). Therefore, we have a reduction 

t' = t[a(h)] t[a(qi . . . q n )] t[q ] -%r+ q (hence t' G L(A',q)) with a 

measure M. strictly smaller than the above reduction for the recognition of t. 
By induction hypothesis, it follows that there exists u G L(Ai,,q) such that 
u n * A > t'. Since t' — t[a(h)] n ^ A > t[b(h)] = t, with a(x) — * b(x), we conclude 

that u n * A > t. 

INSfj r5t : let t 6 L(A',q) (q G Ql), and assume that an transition (i a , q ,p, i a ,q) 
is used in a run of A' on t, and that this transition was added to T' because 
a(x) — > a(px) G 11/ A. Let 

t = t[a(t p h)] t[a(pq 1 . . . q n )] t[q ] q 

be a reduction of A', with t p G L(A,p), such that the above transition is involved 
in the step t\a{pq\ . . . q n )} - A r^ t[qo], where the the transition b(B aiqo ) — > qo is 
applied. Hence pq\...q n G L{B a , qo ), with i aA p ^ 1,,,g " > f a , q , and the first 

step in this computation is (i a ,q,P, ia,q)- By deleting this first step, we get 
ia.q ff'" 9 " > f a ,q, hence q\...q n G L(B a . qo ). Therefore, we have a reduction 

t' = t[a(h)] t[a(qi . . . q n )] t[q ] q (hence t' G L(A',q)) with a 

measure M strictly smaller than the above reduction for the recognition of t. 
By induction hypothesis, it follows that there exists u G L(AL,q) such that 
u -^-> t'. Since t' = t[a(h)] n ^ A > t[a(t p h)] = t, with a(x) — ► b(x), we conclude 

INS| as t : this case is similar to the previous one. 

INSimo : let t G L(A',q) (q G Ql), and assume that an transition (s,p,s) 
is used in a run of A' on t, and that this transition was added to T' because 
a(xy) — > a(xpy) G TZ/A. Let 

t = t[a{ht p £)] -^t[a(q 1 ...q n pq[...q' m )} t[q ] -%r> q 
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be a reduction of A', with t p G L(A,p), such that the above transition (s,p, s) 
is involved in the step t[a(q\ . . .q n pq[ . . . q' m )] A > > t[qo], where the transition 
b(B a .q ) — > go is applied. More precisely, assume that qi . . .q n pq[ . . .q' rn G 
L(B a „„), because i a > s -~ — > s < ii--im > ^ gy d c i c ti n g the middle 

"a,90 "(1,90 -"a.gn ' 

step (s,p, s), we get i a>9 ^'''bJ^'" 9 " > /a, 5 , hence qi . . .q n q[ . . .q' m G L{B am ). 
Therefore, we have a reduction i' = t[a(W)] -|r-» i[a(gi . . . q n q[ . . . q' m )] 
t[Qo] 2-' 8 ^ ( nence £ L(A',q)) with a measure .M strictly smaller than the 
above reduction for the recognition of t. By induction hypothesis, it follows 
that there exists u G L{Ah,q) such that u n *^ A > i'. Since t' = t[a(hl)] n ^ A > 

t[a(ht p £)] = t, with a(a;y) — > b(xpy), we conclude that u > i. 

INSieft : let i G L(.4',g) (g G Ql), and assume that an transition (s,p, s) 
is used in a run of .4' on t, and that this transition was added to V because 
a(x) — ► pa(x) G 7\L/*4 and because there exists (s, go, s ') G r" for some go G Ql 
with L(-B a ,<J ^ 0. Let 

t = t[t p a(h)\ -^r> t\pq ] q 

be a reduction of A', with t p G L(A,p), involving the transition (s,p,s) in 
s g qa t > s' , for some 6. Removing the transition (s,p,s), we have s ^° - > s' 

and a reduction i' = t[a(h)] -|r-> i[g ] g (meaning i' G L(.4',g)) with a 

measure .M strictly smaller than the above reduction for the recognition of t. 
By induction hypothesis, it follows that there exists u G L(AL,q) such that 
it > t'. Since i' = i[a(/i)] n/A > f[t p a(/i)] = t, with a(a;) — » p,a(x), we 

conclude that u n * A > i. 

INS right : this case is similar to the previous one. 

RPL : let i G L(A',q) (q G Ql), and assume that a horizontal transition 
(s,p,s f ) is used in a run of A' on t, and that this transition was added to V 
because a(x) —* p G 7£/„4 and because there exists (s,g ,s') G r" for some 
9o G Ql such that L(B' a qa ) ^ 0. Let 



be a reduction of A' , with i p G L(A,p), involving the added transition (s,p, s') in 
s —J^ — ► s', for some b and some q' G Ql- Replacing the transition (s,p, s') with 

b,q' 

(s,q ,s'), we obtain s B qo ■ > s' and a reduction t! = t[a(h)] -^r-» t[q ] -|r-> g 

(meaning i' G L(A',q)). The measure of this later reduction is strictly 
smaller than the above reduction for the recognition of t, because the transition 
(s, go, s') belongs to To (no such transition can be added by the above proce- 
dure). By induction hypothesis, it follows that there exists u G L(Al,q) such 
that u ^j A > t'. Since t' = t[a(h)] n ^ A > t[t p ] = t, with a(x) — > p, we conclude 

that u -^jx* t. 

DEL : let i G L(A',q) (q G Ql), and assume that a horizontal transition 
(s, e, s') is used in a run of .4' on i, and that this transition was added to T' 
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because a{x) — > () £ 7£/.A and because there exists (s,qo,s') e r" for some 
9o G Ql such that L(B' aqo ) ^ 0. Let us replace this e-transition (s,e,s') with 
(s, qo, s') in a reduction t -4t-> q, we obtain a reduction 

t' = i[a(/i)] t[q ] q. 

It means that t' £ L(A',q). The measure .M of this later reduction is strictly 
smaller than the above reduction for the recognition of t, because the transition 
(s,qo,s') belongs to To (no such transition can be added by the above proce- 
dure). By induction hypothesis, it follows that there exists u £ L(Al,q) such 
that u n * A > t' . Since t' = t[a(h)] n ^ A > t, with a(x) — > (), we conclude that 

(end Lemma direction C) □ 
Lemma 3 L(A') 3 post* n ^ A {L). 

Proof. We show that for all t £ L, if t n *^ A > u, then u £ L(A'), by induction 
on the length of the rewrite sequence. 

Base case (0 rewrite steps). In this case, u = t £ L and we are done since 
L = L(Al) C L(A') by construction. 

Induction step. Assume that t ^ A > u with t £ L. We analyse the type of 
rewrite rule used in the last rewrite step. 



REN. The last rewrite step of the sequence involves a rewrite rule of the form 
a(x) -> b(x) £ K/A: 

*K>0] ^7T- t[b(h)} = t. 



K/A L v n K/A 

By induction hypothesis, t[a(h)] £ L(A'). Hence there exists a reduc- 
tion sequence: t[a(h)] -%r+ t[a(qi . . . q n )] t[q Q ] -%r+ q f £ Q f L with 
qi-..q n £ L(B' aqo ), i.e. i a , qo gi" 9 " > f a , qo - By construction, the e-transitions 

{%, qo -,£-,ia,q Q ) and (/ a . g , £, h,q ) nave b een added to V . Hence % So ^V 9 " > 
fb >qo and qi...q n £ L(B' bqo ). Therefore there exists a reduction sequence: 
t = t[b(h)} -Zr* t[b( qi . . . q n )}'^ t[q Q ] -^q f £ Q f L and t £ L(A'). 

I NSfj r5 t - The last rewrite step of the sequence involves a rewrite rule of the form 
a(x) — > a(px) £ 1Z/A, with p £ P: 

~F7^ t[a(t p h)\ = t 



K/A L v n K/A 

with t p £ L(A,p). By induction hypothesis, t[a(/i)] £ L(A'). Hence there exists 
a reduction sequence: t[a{h)\ -^r-> t[a(q\ . . . q n )] t[q ] -JW qf £ Q f L with 
qi...q n £ L(B' ), i.e. i aSQ g 'r g " > / O)9o . By construction, the transition 

a. go 

(ia,q ,P,ia.q„) has been added to V. Hence i Q . 9o -=£ — > i am g 'r g " > / 6;9o , i.e. 
pqi . . .q n £ L(B' ) and there exists a reduction sequence 

f = t[a(t p h)\ -^r> t[a(pq! . . . q n )\ t[q ] -^r> Qf G <3l- 
It follows that t£ L(A'). 
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INS| ast . The case where the last rewrite step of the sequence involves a rewrite 
rule of the form a(x) — > a(xp) G 7£/.4, with p G P is similar to the previous 
one. 



I NSi n to- The last rewrite step of the sequence involves a rewrite rule of the form 
a(xy) — * a(xpy) G 72./.A, with p e P: 

u li/T -n/T Mhtpt)} = t 

withip G L(A,p). By induction hypothesis, t[a(h£)] G L(-4'). Hence there exists 
a reduction sequence: i[a(/i^)] t[a(qi ...q n q[... q' m )] t[q ] -%r+ q f £ 

Ql with 9i • • • In Qi ■ ■ ■ q'm e L(B' ), i.e. i aigo g/' 9 " > s ^V gm > /„,,„ for some 
state s <E S. By construction, the looping transition (s,p,s) has been added 
to r'. Hence i a ,g qi B r qn > s -g£ — > s g B V gm > / o>?0 , i.e. q t . . . q n pq[ . . . q' m G 

a,q a,q Q a,q 

L(B' ) and there exists a reduction sequence 

t = t[a(ht p £)] -^r> t[a{qx ...q n pq[. ..q' m )\ -jr> t[q ] q f G Q f L . 
It follows that t G L(A'). 



INSieft- The last rewrite step of the sequence involves a rewrite rule of the form 
a(x) — > pa(x) G 72/ -4, with peP: 



L v n 111 A 

with t p G L(A,p). By induction hypothesis, t[a(h)] G L(»4'). Hence there exists 
a reduction sequence: i[a(/i)] -^t-> t[a{q\ . . .q n )] t[qo] 5f G Ql- Hence 
L{B' a qa ) ^ and at some point of the reduction, a transition (s, q , s') G r" is 
involved. By construction, the transition (s,p,s) has been added to r". Hence 
there exists a reduction sequence t = t[t p a(h)] -jjV-> t[pq ] qf G Q f L . It 

follows that t G L(A'). 



INS r ighf The case where the last rewrite step of the sequence involves a rewrite 
rule of the form a(x) — ► a(x)p G 1Z/A, with p G P is similar to the previous 
one. 



RPL. The last rewrite step of the sequence involves a rewrite rule of the form 
a(x) 11/ A, with p G P: 



tc/^ L v n n/A 

with t p G L(A,p). By induction hypothesis, t[a(h)\ G Z-(„4'). Hence there exists 
a reduction sequence: t[a{h)\ —^r-> i[a(gi . . . g n )] t[<zo] 3f £ Q L . Hence 
L(B' a qa ) ^ and at some point of the reduction, a transition (s,qo,s') G r" 
is applied. By construction, the transition (s,p,s') has been added to T', and 
there exists a reduction sequence t = t[t p ] -^r-> t[p] qf G Q f L . It follows that 
* G L(A'). 
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DEL. The last rewrite step of the sequence involves a rewrite rule of the form 
a(x) ->•()€ 1Z/A: 

By induction hypothesis, t[a(h)] G L(A'). Hence there exists a reduction se- 
quence: t[a(h)} t[a(qi . ..q n )] t[q ] q f G Q f L . Hence L(B' a qo ) f 
and at some point of the reduction, a transition (s,qo,s') G V is applied. By 
construction, the e-transition (s,e, s') has been added to V , and there exists a 
reduction sequence t -J-> gf G Q f L , hence t G L(.A'). 

(end Lemma direction D) □ 
(end of the proof of Theorem [lj □ 



C Appendix: proof of Theorem [2] 

TheoremUl Given a HA A on I] and a PTRS 11/ A G XACU+, for all CF-HA 
term language L, post^^(L) is the language of an CF-HA of size polynomial 
and which can be constructed in PTIME on the size of 1Z, A and an CF-HA 
recognizing L. 

Proof. Let A = (P, P f , O) and let us assumed that it is normalized. Let Al = 
(Ql,Ql, A^) be a CF-HA recognizing L, normalized and without collapsing 
transitions (this can be assumed thanks to Lemma [lj The state sets P and 
Ql are assumed disjoint. We shall construct a CF-HA extended with collapsing 
transitions A 1 = (PI+IQl, Q l , A') recognizing post^^ A (L). The set of transitions 
A' is constructed starting from At U 9 and analysing the different cases of 
update rules. 

For each a G S, q G Ql, let L a q be the context-free language in the transition 
(assumed unique) a(L a ^ q ) — > q G A L , and let Q a , q = (Q L , N a ^ q , I a , q , T a , q ) be a 
CF grammar in Chomski normal form generating L a q . It has alphabet (set of 
terminal symbols) Ql, set of non terminal symbols N a . 9 , initial non-terminal 
Ia,q G N a ^ q , and set of production rules T^g. The sets of non-terminals N atq are 
assumed pairwise disjoint. 

Let us consider one new non-terminal V a q for each a G £ and q G Ql- Each 
of these non terminals aims at becoming the initial non terminal of the CF 
grammar in the transition associated to a and q in A'. For technical convenience, 
we also add one new non terminal X p for each p G P. For the construction of 
A', we shall construct below a set C of collapsing transitions, initially empty, 
and a set V of production rules of CF grammar over the set of terminal symbols 
in P U Ql and the set of non terminals 

X= U (N a , q u{r a J)u{x p \ P eP}. 

Initially, we let F' = T' := \J (P a>9 U {l' a>q := I a , q }) U{X p :=p\pe P}. 

aeT,,qeQ 

We now proceed by analysis of the rewrite rules of 1Z/A for the completion 
of r' and C". At each step, for each a G £ and q G Ql, we let Q' a q be the CF 
grammar (P U Ql, A/", I' a q , T'), and let L' a = L{Q' a q ). The production rules of 
r' remain in Chomski normal form after each completion step. 
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REN: for every a(x) —* b(x) E TZ/A, q E Ql, we add one production rule 
V — T' tn r' 

1 b,q -~ 1 a,q w 1 ■ 

INSf irst : for every a(x) — > b(px) E TZ/A, q E Ql, we add one production rule 
INS[ ast : for every a(x) — » b(xp) E TZ/A, q E Ql, we add one production rule 

INSi nto : for every a(xy) — > a(xpy) G TZ/A, q G Ql and every N E AT reachable 
from I' a using the rules of V , we add two production rules N := NX p 
and N :'= X p N. 

INSieft: for every a(x) — > pa(x) G TZ/A, and g G Ql such that L' a q ^ 0, we 
add one collapsing transition pq — > g to C". 

INS r ight: for every a(x) — * a{x)p G TZ/A, and g G Ql such that ^ 0, we 
add one collapsing transition qp — ► g to C". 

RPL': for every a(x) — > pi . . .p n G TZ/A, with n > 0, and g G Ql such that 
-^a,5 7^ we add one collapsing transition p± . . .p n — ► g to C". 

DEL: for every a(x) — ► () G T2./.4 and q G Ql such that X^ j<3 ^ 0, we add one 
collapsing transition () — > g to C". 

Note that INSfj r5t , INS| a5t , RPL are special cases of respectively INSf ir5t , INS[ a5t , 
RPL'. 

We iterate the above operations until a fixpoint is reached. Indeed, only a 
finite number of production and collapsing rules can be added. Finally, we let 

A' := Qu{a(L' a J -> q \ a G S, q G Q, ^ 0}uC'U{i^ -> g | 0(1) - s G ft/ A ^ 0}. 

We show that L(-4') = post^^ A {L). It follows that post^, A (L) is a CF-HA 
language by Lemma [U 

Lemma 4 L(A') C post^^^L). 

Proof. We show more generally that for all i G L(^4',g), q E Ql, there exists 
it G L(Al, q) such that it > i. The proof is by induction on the number of 
applications of collapsing transitions in the reduction t -n > g. 

Base case. For the base case (no collapsing transition applied), we make a 
second induction on the number of application of production rules of V \ To 
in the derivations, by the grammars Q' am , for the generations of the sequences 
of states gi . . . q n E Q* used in moves of A' of the form u[a{q\ . . . q n )\ — > u[q ] 
in the reduction t ^ > g. Let us note h the relation of derivation using the 
production rules of T' , and h* its transitive closure. 

Intuitively every application of a production rule of V \ Tq corresponds to 
a rewrite step with a rule of TZ/A in the rewrite sequence u > t, according 
to the above construction cases. 
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Base case (second induction). For the base case, no production rule of 
r' \ To is applied. It means that t ' 9 (every CF grammar derivation in the 
reduction t q starts with I' h I a , q ) and we let u = t. 

Induction step (second induction). Assume that the reduction t -JW q 
has the form 

t = t[a(h . . . t n )\ -^r> t[a{qi ■ ■ ■ q n )\ ~^r> t[q Q ] q 

where t[a(q\ . . . q n )\ t[q n ] is one transition such that the derivation of 

a an Ql • • • qn b y G'a, qo involves one production rule of r' \ T Q . We shall 
analyse below the different cases of rewrite rules of TZ/A (rules of type XACUi) 
which permitted the addition of this production rule of T' \ r . Let us first note 
before that we can assume that for every i < n, U qi because no collapsing 
transition are used, by hypothesis. Hence, together with the above hypothesis, 
it follows that ti £ L(A, q{) for all i < n. 

Case REN. We have I' aqo h I' bqo h* qi-..q n , and the first production rule 
used in this derivation, I' a qa :— I' b was added because there exists a rule 
b(x) — > a(x) £ 1Z/A. It follows that I' b qo h* q\ . . . q n and then that 

s = t[b(h . . . t n )] -jr> t[b(q 1 . . . q n )] t[q ] q. 

Hence, by induction hypothesis, there exists u £ L(A, q) such that u ^jj^ > s- 
Moreover, s = t[b(t\ . . . t n )} n ^ A > t = t[a(ti ...t n )] using b(x) — > a(x) £ TZ/A. 
Hence u n * A > t. 

Case INSf jr5t . We have I' b qa h X p I' a qo h* q\...q n , and the first production 
rule used in this derivation, I' b qo := X p I' a qo was added because there exists 
a rule a(x) — > b(px) £ TZ/A. By construction, it follows that qi — p and 
7 a,g h * 92 •■■Qn, and 

s = t[a(t 2 ■ ■ ■ t n )\ -^r> t[a(q 2 . . . q n j\ t[q ] q. 

By induction hypothesis, applied to the above reduction, there exists u £ L(A, q) 
such that u n *^ A > s. Moreover, s = t[a(t 2 ■ ■ ■ t n )] n ^ A > t = t[b(t\ . . . t n )] using 

a(x) — > a(px) £ TZ/A, because t\ £ L(A,p). Hence u ^ A > t. 
Case I NS| ast . This case is similar to the previous one. 

Case INSinto- We have I' a qo h* aN/3 h aNX p [3 h aN p(3 h* q 1 . . . q n , and the 
production N := NX p was added because there exists a rule a(xy) — > a(xpy) £ 
TZ/A, and N is reachable from I' using V . It follows that there exists two 
integers k < i < n such that a h* q\. . .qk and NX p h* qu+i ■ . . qe (hence 
qi = p) and (3 h* qi +1 . . . q n (if I = n then this latter sequence is empty), and 

s = t[a(t 1 ...te-it e+ i...t n )] -jr>t[a(q 1 ...qe-iqe+i---q n )] ~jr> t[q ] -%r> q. 
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By induction hypothesis, applied to the above reduction, there exists u G L(A, q) 
such that u n * A > s. Moreover, s = t[a(ti . . . ti-\ t^+i . . . t n )] h/a ' * = 
f[a(ti . . . t n )] using the rewrite rule a(xy) — > a{xpy), because t„ G L(.A,p). 
Hence u > i. 

Induction step (first induction). Assume that the reduction i q has 
the form 

t = . . . t n ] %! . . . g n ] t[q ] -%r> q (1) 

such that there exists a collapsing transition L' -> 9 6 A' with q± . . .q n G L' 
and the first part of the reduction, t -n > t[(7i...g„], involves no collapsing 
transition. It implies in particular that tj G L(A',qi) for all i <n. 

The collapsing transition L' — > g belongs to C (by hypothesis Al and .4 do 
not contain collapsing transitions) and was added because of a rewrite rule of 
71/ A in XACU+. We consider below the different possible cases for this addition. 



Case INSieft- We have n = 2, qi = p G P, (72 = qo and the collapsing transition 
pqo — * qo has been added because there exists a rule a(x) — > pa(x) G TZ/A. In 
this case, the reduction fT} is 

* = £[*i*2] *b?o] %o] 9 

and we have s = £[£2] j > t[qo] -j> > 5 because the first part of the reduction 
uses no collapsing transition. By induction hypothesis, there exists u G L(A, q) 
such that u ^/a ' s - Moreover, s ^Ja ' * usm § the rewrite rule a(x) — > pa(cc), 
because £i G L(^4,p). Hence u ^7-^ > £. 

Case INS r ighf This case is similar to the previous one. 

Case RPL'. In this case, for all i < n, qi = pi G P and the collapsing transition 
Pi . . .p n — ► qo was added because there exists a rewrite rule a(x) — > p\ . . .p n G 
TZ/A and -ka, 9o ^ ^' Hence there exists a term a(h) G £(-4', qo), and 

s = i[a(/i)] -jr> t[q ] q 

By induction hypothesis, there exists u G L(A, q) such that u k*/a > s. More- 
over, using the rewrite rule a(x) — > p\ . . .p n , s U j A > t because U G L(A,Pi) for 
all i < n. Hence u *, -. > t. 



Case DEL. In this case, n — and the collapsing transition () — > 50 w as added 
to C" because there exists a rewrite rule a(x) — > () G TZ/A and 7^ 0. Let 

a(h) G L(A', qo), we have s = t[a(/i)] -jr+ t[qo] -4r-> q. By induction hypothesis, 
there exists u G L(A, q) such that u K *^ A > s. Moreover, s n ^ A > < using the 

rewrite rule a{x) — ► (), and zt ^7-7 > t. 
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Case DEL 5 . In this last case, the collapsing transition L' aqa — > g was added 
to A' because there exists a rewrite rule a(x) — > a; £ 7£/„4 and i' a (jQ ^ 0. We 
have 

s = t[a(t! . . . t n )\ -^r> t{a(qi ■ ■ ■ q n )\ ~^r> t[q ] -^r> q 

because q\...q n £ L' aqQ . By induction hypothesis, there exists u G L(A,q) 
such that u n *^ A > s. Moreover, s n ^ A > t using the rewrite rule a(x) — > a;, and 

(end Lemma direction C) □ 
Lemma 5 L(A') 3 post^^ A (L). 

Proof. We show that for all u £ L, if u > t, then i £ £(-A')> by induction 
on the length of the rewrite sequence. 



Base case (0 rewrite steps). In this case, u = t G L. We can note that 
L C L(A') because T' contains the production rule I' := I a . q for all a 6 S, 
Q G Ql- Hence, t G L(.4')- 



Induction step (k + 1 rewrite steps). We analyse the type of rewrite rule 
used in the last rewrite step of u n * A > t. 

REN. The last rewrite step of the sequence involves a rewrite rule of the form 
a(x) -> 6(a;) G ft/A 

W] -=77* u Hh)} = t. 



K/A L v " K/A 

By induction hypothesis, u[a(h)] G L(»4'). Hence there exists a reduction se- 
quence: u[a(h)} u[o(gi • • • g„)] ufao] 3f G <3l with q\...q n G 
ijj , i.e. qi . . .q n can be generated by 9o > starting from I' aqo and using the 
production rules of V . 

By construction, V contains the production rule I' b qo := ^o )9o - Hence q\. . .q n G 
9o : it can be generated by Q' b qo , starting from l' h qo and using the production 
rules of T'. 

Hence t = u[b(h)] -^-> u[b(q 1 . . . q n )\ -y-> u[q ] -%r+ <?f G Ql, i- e - * G L(A'). 

INSf jrst . The last rewrite step of the sequence involves a rewrite rule of the form 
a(x) -> 6(par) G 11/ A, with peP: 

a W] u[b(t p h)} = t 



K/A L v " K/A 

with tp G L(.A,p)- By induction hypothesis, w[a(/i)] G £(A). Hence there exists 
a reduction sequence: u[a(h)} -^r-> u[a(gi . . . </„)] -^r^ «[<Zo] -^p-* 3f £ <?£ with 
9i • • • 5n G L' aqo , i.e. q\...q n can be generated by (% 9o , starting from and 
using the production rules of T'. 

By construction, r" contains the production rule I' b qa := X p I' a qQ . Hence 
pqi... q n is in L' bqQ . Hence t = u[b(t p h)} u[b(pq 1 . . . q n )] u[q Q ] 

q f G Q f L , i.e. t G L(A'). 
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INS[ ast . This case is similar to the above one. 

INSi n to- The last rewrite step of the sequence involves a rewrite rule of the form 
a(xy) — > a(xpy) G 1Z/A, with p G P: 

<hl)]^rj*u[a{ht p l)]=t 



n/A y " n/A 

with t p G L{A 1 p). By induction hypothesis, u[a(M)] G L(A'). Hence there 
exists a reduction sequence: u[a(M)] -^r-> u[a{q\ . . . q n )\ -jr+ u[q ] -|r-> qf G Q f L 
with qi . . . q n G L' aqa , i.e. qi . . . q n can be generated by Q' am , starting from I' aqa 
and using the production rules of V. 

By construction, V contains the production rules N := NX p and N := X p N for 
all non terminal N reachable from I' q qQ using V . Using one of these production 
rules, it is possible to generate q\ . . -qjpqj+i ■ ■ .q n with Q' a ^ qo , starting from 
I' a qo and using the production rules of T', where j is the length of h. Hence 
t = u[a(ht p £)] u[b(q 1 ...q j pq j+1 ...q n )] u[q ] -%r* q f G Q f L , and 

* G L(A'). 

INSieft- The last rewrite step of the sequence involves a rewrite rule of the form 
a(x) — > pa(x) G 1Z/A, with p G P: 

u !€/a* u HM] u ^p a ( h ^ = l - 

with t p G L(A,p). By induction hypothesis, w[a(/i)] G L(A'). Hence there exists 
a reduction sequence: u[a(h)] -%r+ u[a(qi . . . q n )] --^ u[qo] -^r-> qt G Q L with 
qi...q„e L' aqo . 

By construction, A' contains a collapsing transition rule pqo — > qo ■ Hence 
t = u[t p a(h)] -%r+ u[pq ] -£r> u[q ] -%r> qt G Q f L , i.e. t G L(A'). 

INS r ighf This case is similar to the above one. 

RPL'. The last rewrite step of the sequence involves a rewrite rule of the form 
a(x) -*■ pi . . .p n G 11/ A, with pi,...,p n G P: 

u[a(h)] -= 7J +u[t 1 ...t n ]=t. 



n/A 1 v n n/A 

with ti G L(A,Pi) for all i < n. By induction hypothesis, u[a(h)] G L(A'). 
Hence there exists a reduction sequence: u[a(h)] -~^> u[a(q\ . . . q n )] —jr* 



u ilo] -%r> qt G Q f L with q± . . . q n G L' a qo . 



Therefore, by construction, A' contains a collapsing transition rule p\. . .p n — > 
q . Hence t = u[ti...t n ] u\pi...p n ] u[q ] -|r-> q f G Q f L , i.e. t G 

L(A'). 

DEL S . The last rewrite step of the sequence involves a rewrite rule of the form 
a{x) -» () 6 1Z/A: 

W>)]^r«[()] = i. 



n/A L v ;J n/A 

By induction hypothesis, u[a(/i)] G L(A'). Hence there exists a reduction se- 
quence: u[a(h)] -%r> u[a{q\ . . . q n )\ u[q a ] q f G Q f L with q\...q n G 



a,q - 



By construction, A' contains a collapsing transition rule () — > go- Hence t 
«[()] U M 9f e QL i.e. t G L{A'). 
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DEL S . The last rewrite step of the sequence involves a rewrite rule of the form 
a(x) -> x G 11/ A: 

u -njA* u W = *• 

By induction hypothesis, u[a(h)] G L(_4'). Hence there exists a reduction se- 
quence: u[a(/i)] -jr> u[a(qx . . . q n )] u[g Q ] -|r-» g f £ Q L with q\...q n G 

a,qo' 

By construction, „4' contains a collapsing transition rule L' a qo — > go- Hence 
t = u[/i] u[gi . . . g„] -jr-> u[q Q ] q f G Q f L , i.e. t G L(_A'). 

(end Lemma direction D) □ 
(end of the proof of Theorem [T]) □ 



D Appendix: proof of Theorem [3] 

Theorem [3j Given a HA A on E and a PTRS 11/ A G XACU+, for all HA 

language L, pre^^(L) is a HA the language. 

Proof. Let A = (P,P f ,Q), and let A L = (Ql,<2 l ,A l ) be a HA recognizing 
L; both are assumed normalized. We also assume wlog that Al is complete: 
for all term t, there exists a state q such that t G L(A',q). Like in the proof 
of Theorem [TJ we assume given, for each a G E, g G Ql, a finite automaton 
£0,8 = (Qi,-S ,g,i , 9 ,{/o, 9 },r o>g ) recognizing the regular language L aA in the 
transition a(L a ^ q ) — > g G Al (assumed unique). 

We shall construct a finite sequence sequence of HA *4o, -4i, • ■ ■ , Ak whose 
final element's language is pre^,^(L), where for all i < n, Ai = (E, Ql, Ql> Ai). 
For the construction of the transition sets Ai, we consider a set C of finite 
automata over Ql defined as the smallest set such that: 

• C contains every B aq for a G E, q G Ql, 

• for all B G C, B = (Q L ,S,i,F,T) G C and all states s,s' G 5, the 
automaton B s s i := (Ql, S, s, {s'}, F) is in C, 

• for all B G C, B = (Q L ,S,i,F,F) G C, g G Ql and all states s, s' G 5, 
the automata (Q L , S, i, F, T U {{s, q, s')}) and (Ql, S, i,F,TU {{s, e, s'}}), 
respectively denoted by £? + (s, g, s'} and -B + (s, e, s') also belong to C. 

Note that C is finite with this definition. For the sake of conciseness, we make 
no distinction below between a NFA B G C and the language L(B) recognized 
by B. Moreover, we assume that every B G C has a unique final state denoted 
/l? and an initial state denoted is- 

First, we let Ao = Al- The other Aj are constructed recursively by iteration 
of the following case analysis until a fixpoint is reached (only a finite number 
of transition can be added in the construction). In the construction we use an 
extension of the move relation of HA, from states to set of states (single states 
are considered as singleton sets): a(L\, . . . , L n ) q (where L%, . . . , L n C Q L 
and q G Ql) iff there exists a transition a(L) — > q G Aj such that L\ . . . L n C L. 

REN: if a(x) -> b(x) G TZ/A, B G C and q G Ql, such that b(B) ^ g, then let 
A i+1 := A, U {a(B) -► g}. 
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INSf irst : if a(x) — > 6(752;) e 7£/.A, B G C and g, g p G Ql, such that L(Ai,q p ) (~l 
L(-4,p) 7^ and 6(g p B) g, then A l+1 := A t U {a(B) -> q}. 

INS[ ast : if a(x) — > 6(arp) G 72./.A, B G C and g, g p G Ql, such that B(.Aj,g p ) (~l 
and 6(Bg p ) q, then A 4+ i := A 4 U {a(B) -> g}. 

INSi nto : if — > a(xpy) G 7£/.4, B £ C, s, s' are states of B, and g,g p G 

Ql, such that L(Ai,q p ) F\L(A,p) ^ 0, s s', and a(B) ^ Ai g then 
A i+1 := A, U {a(B + (s, e, s')) -> q} . 

INSieft: if a(x) — ► pa(i) G 11/ A, 6 G £, B,B' G C, s,s' are states of B, 
and g, g p ,g' G Ql such that 6(B) — > g G Aj, a(B') ^ Ai g', L(Ai,q p ) n 

7^ 0, « then A 4+1 := A, U {6(B + (s, g', s'» - q}. 

INS r ight: if a(x) — > a(x)p G 7£/.4., 6 G £, B,B' G C, s,s' are states of B, 
and q,q p ,q' G Ql such that 6(B) — > g G Aj, a(B') ^ Al g', L(A>5p) H 
I(Ap) 7^ 0, * then A 4+1 := A, U {6(B + (s, q', s')) - g}. 

RPL': if a(x) pi . . .p„ G ft/A 6 G S, B, B' G C, s, s' are states of B, and 
g, g', gi, . . . , g„ G Ql such that 6(B) -> q G Aj, a(B') ^ Ai g', L(Ai,qj) n 
£(APj) 7^ for all 1 < 7 < n, s > s' then A l+i := A 4 U {6(5 + 

( S ,g', S ')))^g}. 

DEL: if a(x) -> () G 72-/^4, 6 G E, B, B' G C, s is a state of B, g, g' G Ql such 
that 6(B) g G Aj, a(B') g', then A m := A, U {6(B + (s, g', s» -> 

«}■ 

DEL 5 : if a(x) — > x G 72./.A, 6 G E, B G C, s,s' are states of B, g, g' G Ql 
such that 6(B) -> g G Aj, a(B s s /) ^ Ai g', then A i+ i := Aj U {6(B + 
(s,q',s'))^q}. 

Note that INSfj rst , I NS| ast , RPL arc special cases of respectively INS{ irst , INS[ a5t , 
RPL'. Since no state is added to the original automaton Al and all the transition 
added involve horizontal languages of the set C, which is finite, the iteration 
of the above operations terminates with an automaton A'. Let us show that 
L(A') = pre* n/A (L). 

Lemma 6 L(A') C pre* n ^ A (L). 

Proof. We show more generally that for all t G L(A',q), q G Ql, there exists 
u G L(Al, q) such that t > u. The proof is by induction on the measure M. 
associating to a reduction t -/^> q the multiset containing, for each transition 
rule p G Aj with i > used in the reduction, the index min(j > | p G Aj). 

Base case. If M. is empty, all the transition are in Ao. It means that t G 
L(Al, q) and we let u = t. 

Induction step. Assume that we have a reduction by A' of the form 

t = t[a(h)} -^r> t[a{qi ■ ■ ■ q n )] -jr^ t[q ] -^r> q (2) 

(with g G Ql, qi ■ ■ ■ q n £ L(B)) and that the step t[a(qi . . . q n )] t[q ] 
applies a transition 6(B) — > g added to Aj + i for some i > 0. We analyse the 
cases which permitted the addition of this transition to Aj + i. 
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REN: the transition a(B) — > go was added to Aj+i because a(x) — > 6(x) £ 71/ A 
and fo(-B) <Zo- Hence, there exists a reduction 

t' = t[b(h)] t[b( qi . . . g n )] t[go] -jr> Q 

with a measure M. strictly smaller than for ([2]), by hypothesis. Therefore, by 
induction hypothesis, there exists u G L(AL,q) such that t' ^ A > u. Since 

t = t[a(h)] a > = ^'i we conclude that t > u. 

INSf jr5t : the transition a(B) — > go was added to Aj+i because a(x) — > b(px) G 
72./.A, with g ,g p G Ql, i(A,9 P ) fli(Ap) # and b{q p B) <7o- Hence, 

there exists a reduction 

t' = t[b(t p h)] -jr> t[b(q p qx . . . q n )] t[q ] q 

with a measure M. strictly smaller than for by hypothesis. Therefore, by 
induction hypothesis, there exists u G L(Az,,q) such that t' > u. Since 

t = t[a(h)] wrz > t[b(t p h)] = t', we conclude that t wrj > u. 



INSf ast : this case is similar to the previous one. 



INSi nto : the transition is a(B') — > g and was added to A i+ i because a(xy) — > 
b(xpy) G 72-/^4, B G C, s,s' are states of £?, goWp G Ql, such that L(Ai,q p ) n 
L(A,p) ^ 0, s s', b(B) ^a, go and B' = B + (s,e,s'). In this case, let 
t = a(h£), and assume that the reduction (0) has the form 

t = t[a{U)\ -^r> t[a(qi ...q n q[... q' m )\ t[q ] q 

with gi . . . q n q{ . . . q' m G L(B') by i B > qi ' B ? n > s -|t-> s' Ql ' B ? m > f B > {ib> and 
fs' are resp. initial and final states of B'). Hence, by construction, we have 
is qi " B qn > s -^-> s' Ql ' B ? m > /b = «b and Jb> — f B ) and there exists a 
reduction 

t' = t[b(h t p £)] t[b{q x . . . q„ q p q[... q' m )\ t[q ] 1 

with a measure .M strictly smaller than for @, by hypothesis. Therefore, by 
induction hypothesis, there exists u G L(Al,o) such that t' ^* A > u. Since 

t = t[a(h£)] n j A > = t', we conclude that £ n * A > u. 

From now on we assume that the reduction of t by A 1 has the form 

* - t[b(h)} f [&(<?! . . . g„)] *[«,] g (3) 

with qi...q n G L{B"), qo G Ql, and that the step f[6(gi . . . q n )} - A ^ i[<Zo] 
applies a transition b{B") — > go added to Aj+i for some i > in one of the five 
cases. 
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INSieft: the transition b(B") — > go was added to Aj_|_i because a(a;) — > pa(x) G 
7?./„4, £>, £>' G C, s, s' are states of -B, go, 5 P , go € Ql, such that b(B) — » g G Aj, 
a(-B') ->a, ?o: L(Ai,q p )nL(A,p) ^ 0, s s ', and B" = B + (s,q' , s'). In 

this case, let t = b(ha(v)£), and assume that the above reduction © has the 
form 

t = t[b(ha(v)£)} -±-> t[b( qi ...q n q' q[... q'J] t[q ] g 

with qi . . . q n q[ . . . q' m G L(B") by i B „ qi B ;? n > s -§k-> s' qi B f, m > / B » (j B " and 
fs" are resp. the initial and final states of B"). Hence, by construction, we 
have i B qi ' B q " > s q " B ° > s' qi "^ m > f B (i B ,, = i B and f B » = fs) and there 
exists a reduction 

t' = t[b(ht p a(v) £)] -^t-> t[6(gi ...q n q p g gi ■ • ■ ?m)] ifeo] 9 

with a measure strictly smaller than for ([3]), by hypothesis. Therefore, by 
induction hypothesis, there exists u G L(AL,q) such that t' n * A > u. Since 
t = i[a(/i a(v)£)} hi a > i[&(/ii P a(u)£)] = t' , we conclude that £ > u. 

I NS r ight = this case is similar to the previous one. 

RPL': the transition b(B") — > go has been added to A^+i because a(x) — » 
Pi . . .p n G 7£/.A, B, B' G C, s, s' are states of 5, go, q'o, Qpi> • ■ • >Qp n e sucn 
that 6(B) -> g G A,, a(B') q' Q , L(A l ,q Pj ) n L{A,p 3 ) f for all j < n, 

s in-ipn > s ' ) an d B" = B + (s,q' ,s'). In this case, let t = b(ha(v)£), and 
assume that the above reduction ([3]) has the form 

t = t[b{ha{v)£)\ -^r>t[b{q 1 ...q m q J q' 1 ...q' m ,)] t[q ] -~> q 

with q 1 ...q m q[... q' m , G L(B") by lB „ qi B ?r » s s' » / B » (* B » 

and f B " are resp. initial and final states of B"). Hence, by construction, we 
have i B qi ^ m > s <?P1 '^ gp " > s' qi '" q ™' > / B (i B „ = i B and = f B ) and there 
exists a reduction, with for all j < n, tj G L(Ai, q Pj ) PI L(^4,pj), 

t' = . . . t n £)] t[b(gi . . . g m g pi . . . q Pn q[ . . . q' m ,)} t[q ] q 

with a measure M. strictly smaller than for ([3]), by hypothesis. Therefore, by 
induction hypothesis, there exists u G L(AL,q) such that t' K * A > u. Since 
t = t[a(h a(v)£)] > t[b{ht\ . . .t n £)] — t' , using the rule a(x) —* p\, . . p„, 

and we conclude that t K * A > u. 

DEL: the transition b(B") — > go has been added to A^+i because a(x) — > () G 
7£/.4, £?,£>' G C, s is a state of B, go, go G Ql, such that b(B) — > g G Aj, 
a(-B') ^->Ai 9o' anc i ^" = B + (s,q' Q ,s). In this case, let t = b(ha(v)£), and 
assume that the above reduction ([3J has the form 

* = t[b(ha(v)£)} t[b( qi ...q m q' Q q' x ... q' m ,)] t[q ] g 

with ft . . . q m q' x . . . q' m , G L{B") by i B „ qi B ?r » s s ^ B ^' > / B » (i B « 

and f B " are resp. initial and final states of B"). Hence, by construction, we 
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have i B 9l " B qm > s qi " B q ™' > f B [i B „ = % B and f B » = fs) and there exists a 
reduction 

t' = t[b{hi)} t[b( gi ...q m q[... q' m ,)} t[q ] q 

with a measure A4 strictly smaller than for ([3]), by hypothesis. Therefore, by 
induction hypothesis, there exists u G L(AL,q) such that £' ^* A > u. Since 

t = t[a(ha(v)i)] n ^ A > t[b(h£)] — £', and we conclude that £ n * A > u. 

DEL S : the transition b(B") — > go has been added to Aj+i because a{x) — > x G 
7£/.4, B G C, s, s' are states of _B, qo,<7o e Ql, such that 6(£>) — > qo G Aj, 
a(B SyS /) q'o, and B" = B + (s,q' Q7 s'). In this case, let £ = b(ha(v)£), and 

assume that the above reduction has the form 

t = t[b(ha(v)£)] -%r> t[b{q x . . . q m a{v x ...v k )q[... q' m ,)} -^-> t[b(q 1 . . . q m q' q[... q' m ,)] t[q ] <? 

with qi . . . q m q' , q[... q' m , G L{B") by i B „ a -gr* «' » /b" (*b« 

and /b" are resp. initial and final states of B") and s > s'. 

Hence, by construction, we have i B Vl ' B q,n > s 1,1 > s' qi '" q ™' > f B (i B » = 
i B and f B » = /b) and there exists a reduction 

£' = t[b(hv£)} t[b(qi . . . q m v x . . . v k q[ . ..q' m ,)} -jr> t[q ] -^r> Q 

with a measure M. strictly smaller than for by hypothesis. Therefore, by 
induction hypothesis, there exists u G L(AL,q) such that £' K * A > u. Since 

t = t[a(ha(v)l)] > t[b(hv£)} = £', we conclude that £ n * A > u. 

Note that INSfi rst , INS| ast , RPL, were not considered above because they are 
special cases of respectively INSf irst , INS[ a5t , RPL'. 

(end Lemma direction C) □ 

Lemma 7 L(A') 2 pre^^ A (L). 

Proof. We show that for all £ € L, if u n * A > £, then u G L(A'), by induction 
on the length of the rewrite sequence. 

Base case (0 rewrite steps). In this case, u = t G L and we are done since 
L = L(Al) Q L(A') by construction. 

Induction step. Assume that u ^ A > £, we analyse the type of rewrite rule 
used in the first rewrite step. 

REN. Assume that u = u[a(h)} ^JA ' "IK^ 1 )] iz*/A ' ^' ^ induction 
hypothesis, u\ — u[b(h)] G L{A') 1 i.e. there exists a reduction sequence 
ui = u[b(h)] u[b(qi . . . q n )} u[q] q f where q,qi,...,q n G Q L , 

q f G Q f L , and a transition a(B) — » q has been added to A', with qy . . . q n G B. 
It follows that u = u[a(h)} -jr* u[a{q\ . . . q n )\ —jr+ u[q] -jt-> <? f , hence that 
u 6 L(A'). 
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INSf irst . Assume that u = u[a(h)] ^Ja ' u [b(tph)] n * A > t for some t p G 
L(A,p). By induction hypothesis, u\ = u[b(t p h)] G L(A'), i.e. there exists a 
reduction sequence 

u[b(t p h)} -^r> u[b(q pqi . . . q n )\ u[q] q f 

where q, q p , qi, . . . , q n G Ql, q f € Q^. Hence i(^4', g p ) (~l is not empty 

because it contains t p , and a transition a(-B) — > q has been added to A' , with 
G B. It follows that u = u[a(h)] u[a{q\ . . .q n )] -^t-> u[g] g , 
hence that u G L(-4'). 

INS[ ast . This case is similar to the previous one. 

INSi nt o. Assume that u = u[a(h£)] n ^ A > u[a(ht p £)] n *^ A > t for some t p G 
L(A,p). By induction hypothesis, u\ — u[a{ht p £)\ G L(A'), i.e. there exists a 
reduction sequence 

ui =ti[a(fti p ^)] -^7- > u[a(gi...g m g p gi...gJ l )] <? f 

where q,q p ,qi, . . . ,q m ,q[, . . . ,q' n G Ql and g f G Q f L . Hence L(^',g p ) n L(-4,p) 
is not empty because it contains t p , and the transition rule denoted p in the 
above sequence has the form b{B) — > q, where qi . . . q m q p q'i ■ ■ ■ q' n is recognized 
by B, with a sequence is qi ' B qm > s s' <?1 ^ <? " > /b for some states s, s' 
of -B. Therefore, a transition a(-B + (s, e, s')) — > q has been added to A', and 
gi . . . q m qi . . . q' n is recognized by -B + (s, e, s'). It follows that u = u[a(h£)] -£r+ 
u[a(q! . . . q m q[ . . . q' n )] u[q] -%r* q f , hence that u G L(A'). 

INSieft- Assume that u = u[b(ha(v)£)] n ^ A > u[b(ht p a(v)£)] n *^ A > t for some 
t p G L(A,p). By induction hypothesis, u\ = u[b(ht p a(v)£)] G L(A'), i.e. there 
exists a reduction sequence 

u[b{ht p a(v)£)\ -j^ u[b( qi . . . q m q p q' q[ . . . q' n )} -%r> u[q] -^-> g f 

where g, g', g p , gi, . . . , g TO , gj, . . . , q' n G Ql, g f G Q f L , and a(u) g'. Hence 

L(A', q p ) n L(v4,p) is not empty because it contains t p , and the transition rule 
denoted p in the above sequence has the form b(B) — ► g with qi . . . q m q p q'q'i ■ . . q' n 
is recognized by B, with a sequence, is qi " B qm > s Qp9 > s' gl g 9 " > /s for some 
of states s and s' of B. Hence, a transition b(B + (s,q',s')) — > g has been 
added to .A', and qi . . . q m q' 'q[ . . . q' n is recognized by B + (s, q', s'). It follows 
that u = u[b(ha(v)£)] -^r-> u[a(gi . . . q m q'q'i ■ ■ ■ ?„)] «[g] -jjr-> q f , hence that 
MGL(A')- 

INSright- This case is similar to the previous one. 

RPL'. Assume that u = u[b{ha(v)£)\ n ^ A > u[b(ht\ . . .t n £)\ n * A > t for some 
t\, . . . ,t n respectively in L(A,pi), . . . , L(A,p n ). By induction hypothesis, u\ = 
u[b(hti . . . t n £)\ G L(A'), i.e. there exists a reduction sequence 

u[b(hti...t n £)} -jr+ u[b(qi . . . q m q Pl . . . q Pn q[...q' m ,)] -%r> u[q] q f 
RR n° 7007 



34 



Jacquemard and Rusinowitch 



where q,q Pl ,. ■ -,q Pn ,qi, ■ ■ ■ ,q m ,q[, ■ ■ -,q' m ' G Q L , q f G Q f L , and for all j < n, 
L(A! ', q Pj ) n L(A,Pj) contains tj, and the transition rule denoted p in the above 
sequence has the form b(B) — > g with gi . . . q m q Pl . . . q Pn q[ . . . q' m , G L(B), 

with a sequence is > s <?P1 " ,gp " > s' gl " B g '" / > /g, for some states s and 

s' of £>. Let q' £ Ql be such that a(t>) -j? > q 1 . By construction, a tran- 
sition b(B + (s, q' , s')) — -> g has been added to A', and gi . . . q m q' q[ . . . q' m , 
is recognized by B + (s,q',s'). It follows that u = u[b(ha(v)£)] —jr+ 
u[a(qi . . . q m q'q[ ■ ■ ■ q' m >)} -jr> u[q] g f , hence that u G L(A'). 

DEL. Assume that u = u[b(ha(v)£)] n j A > u[b(h£)} K * A > t. By induction 
hypothesis, u\ = u[b(h£)] G L(A'), i.e. there exists a reduction sequence 

u[b(h£)] -±r> u[b( qi ...q m q' x ... q' m ,)] u[q] q f 

where q, gi, . . . , q m , q[, . . . , q' m , G Ql and q f G Q f L . The transition rule denoted p 
in the above sequence has the form b(B) — > g and gi . . . g m g( . . . g^ n , is recognized 

by B with a sequence is qi " B q,n > s Ql "^^' > f B , where s is a state of -B. Let g' € 
Ql be such that a(u) -p > g'. By construction, a transition + (s, q' , s)) — ► g 
has been added to A', and qi . . . q m q' q[ . . . q' m , is recognized by i? + (s, g', s). 
It follows that u = u[b(ha(v)£)] -^-> u[a(gi . . . q m q'q[ ■ ■ ■ q' m >)] u[q] q f , 
hence that u G L(A'). 



DEL S . Assume that u = u[b(ha(v)£)} > u[b(hv£)] -j^/a > ^ ^ induction 

hypothesis, u\ — u[b{hv£)\ G L(A'), i.e. there exists a reduction sequence 

u[b(hv£)} -^-> u[b{ qi ...q m q'(... q£ q[ . . . q' rn ,)\ u[q] -^-> g f 

where g, gi, . . . , q m , q'{, . . . , q'^, q{, . . . , q^, G Ql and q f G Ql- The tran- 
sition rule denoted p in the above sequence has the form b(B) — > g and 
gi . . . g m g", . . . , g," gi . . . g^ is recognized by B, with a sequence ?b <?1 ' £? <3m > 

s 9l B ,<? " > s' gl " B gm ' > /b, where s, s' are two states of B. By completeness of 
Al, given s, s', there exists q' such that a(B s ^ s i) q 1 ■ It follows in particular 
that a(v) .1 > g'. By construction, a transition + (s,g',s)) — > g has been 
added to A 1 , and qi . . . q m q' q[ . . . q' m , is recognized by B + (s, g', s). It follows 
that u = u[b(ha(v)£)] -%r* u[a(qi . . . q m q' q[ . . . q' m ,)\ u[q] -J-> g f , hence 

that u G L(A'). 

(end Lemma direction C) □ 
(end of the proof of Theorem [3]) □ 



E Appendix: proof of Theorem [4] 

Theorem [4l Reachability is undecidable for uniform PGTRS without vari- 
ables and parameters. 

Proof. We will reduce the halting problem of Deterministic Turing Machines 
(TM) that work on half a tape (unbounded on the right). We consider the 
following unary symbols to represent the tape alphabet S = {0, 1, jj, b}. We 
need a copy of the alphabet £' = {0', 1', jj', b'}. We only use jj to mark the left 
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endpoint of the tape and b is the blank symbol, e.g. representing the rightmost 
part of the tape. 

The state symbols are constants in a finite set Q U Q' where Q = 
{qi, q2, ■ ■ ■ , q n } and Q' = {q[, q' 2 , . . . , q' n }. Hence each state of the TM has 
two representations. 

In order to represent a Turing machine configuration as a ground term we 
shall introduce a binary symbol + and a miliary symbol _L. Now the TM config- 
uration with tape abccde\>\> . . ., symbol under head d, state q will be represented 
by: 

«(±) + (a(_L) + (6(_L) + (c(_L) + (c(_L) + (d(q) + (c(_L) + b(T))))))). 

We denote by To (resp. 71) the set of terms on signature £ U {_L, +} with no 
occurrence of jj (resp. with a unique occurrence of jj at position 1). Given a 
term in t 6 % and a term s G T(S) we write i[_L <— s] the term obtained from 
i by replacing its rightmost _L symbol by s. 

For each TM transition we introduce some rewrite rules that simulate it on 
the term representation. We introduce now some tree regular languages: L s ^ a is 
the subset of t € T(T,) such that t admits a single occurrence of a state symbol 
and this state symbol is s, and it occurs right below a symbol a. 
" In state q reading a go to state r and write &" . This is translated to the ground 
rewrite rule: 

L q ,a ■■ a(q) -> b(r) 

" In state q reading a go to state r and move right" . This can be simulated by 
some application of rules: 

L q , a ::u(L) -» u(r') for all ue {0,1, ft} (4) 
L 9 , Q ::b(T) - b(r') + b(T) (5) 

Note that one of these rule application may create a pattern a(q) + (b(r r ) + x) 
at the location where we had a pattern a(q) + (b(A-) + x) in the configuration. 
Let £ 9 , a ,r,ii be the set of term of type U[± <— (a(q) + ib{r') + V)] where U E T\, 
V E %. This is clearly a regular language. Then we add the rules: 

Lq.a.rM ■■ Ct(_L) (6) 

L r , iU ::u(r') u(r) for all iie{0,l,|} (7) 

"In state q reading a go to state r and move left". This can be simulated by 
some application of rules: 

L q , a ::u(L) -» u{r') for all ue {0,1,(1} (8) 

This rule application may create a pattern b(r') + (a(q) + x) at the location 
where we had a pattern b{lS) + (a(q) + x) in the configuration. Let ig, a ,r,L be 
the set of term of type U[± *- {{b{r') + a(q)) + V)] where U tT u V & This 
is clearly a regular language. Then we add the rules: 

Lq,a,r,L ■■ d(q) -> o(J_) (9) 

L r , iU ::u(r') u(r) for all ue{0,l,tt} (10) 

Let us denote 1Z = {Li :: ii — > rj | 1 < i < n} the set of rules we obtain by the 
above construction. Note that the languages Li are pairwise disjoint. By case 
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inspection we can show that for any couple of TM configurations T\ , T 2 and 
their respective term encodings t\,t%, there is a sequence of transitions from Ti 
to T 2 iff t% t2- If we replace in every rule the regular language Lj by the 
disjoint union l±)i<j<„-^i, the result still holds. The theorem follows. □ 



F Appendix: proof of Theorem [5] 

Theorem \B Given a HA A on £ and a PTOS ft/M G X4CL/ 2 +, /or a// 77,4 
language L, pre^^(L) is a HA the language. 

Proof. The proof is very close to the one of Theorem [3] Indeed, in the above 
construction for TheoremOU we consider the applications of rules I NS| e ft , I NS r ] g ht , 
RPL', DEL and DEL S under any symbol 6 6 S. Here instead, we can restrict 
the construction to the application under the symbol specified in the lhs of the 
rewrite rules. More precisely, let us just detail below the cases of the construc- 
tion which are modified. The rest of the prof is the same as for Theorem [3] 

INS 2 j e ft: if b(ya(x) z) — > b(ypa(x) z) G TZ/A, B, B' G C, s, s' are states of B, 
and q,q p ,q' G Ql such that 6(B) — > q G Aj, a(B') Q 1 , L(Ai 7 q p ) n 

L(A,p) ^ 0, s then A 4+1 := A, U {b(B + (s, q', s')) - q} . 

INS2, right: if b(ya(x) z) — > b(ya(x)pz) G T2-/.4., B, B' G C, s, s' are states of B, 
and q,q p ,q' G Ql such that 6(B) — » q G Aj, a(B') ^a s q', £(A,<Zp) H 
7^ 0, « then A 2+1 := A, U {6(B + (s, q', s'» q} . 

RPL 2 : if b(ya(x)z) — > b{ypi...p n z) G 7S././4, B,B' G C, s, s' are states of 

B, and q,q' ,qi, . . . ,q n G Ql such that 6(B) Aj, a(-B') ^a* q', 

L(A t ,qj) ("1 L(A,Pj) ^ for all 1 < j < n, s > s' then A 4+ i := 
A t u{6(B + ( S ,q', S ')))^<z}. 

DEL 2 : if b(ya(x)z) — > 6(yz) G B.B' G C, s is a state of B, q,q' G Ql 

such that 6(B) -> q G Aj, a(B') q', then A 4+ i := A 4 U {b(B + 

(s,q',s)) -> q}. 

DEL 2iS : if b(ya(x)z) — ► b(yxz) G 72-/^4, B G C, s, s' are states of B, q, q' G Ql 
such that 6(B) -> q G Aj, a(B SjS /) q', then A i+ i := Aj U {6(B + 

(s,q',s'>) ->?}• □ 
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